Skip to content

Conversation

aaronsteers
Copy link
Contributor

@aaronsteers aaronsteers commented Aug 1, 2025

feat: add install CLI command for pre-installing connectors

Summary

This PR adds a new install CLI command to PyAirbyte that allows pre-installing connectors, particularly useful for front-loading installation costs during image build processes. The command accepts generic connector arguments that work for both sources and destinations.

Key Features:

  • New pyab install command with comprehensive connector installation options
  • Supports all installation methods: pip URLs, Docker images, local executables, and YAML manifests
  • Includes --use-python option for Python interpreter selection
  • Uses get_connector_executor() directly for efficiency and type-agnostic approach
  • Proper error handling following existing CLI patterns

Example Usage:

# Install latest version of a connector
pyab install --connector=source-hardcoded-records

# Install with specific Python interpreter
pyab install --connector=source-hardcoded-records --use-python=true

# Install from pip URL
pyab install --connector=source-hardcoded-records --pip-url="airbyte-source-hardcoded-records==0.0.30"

Review & Testing Checklist for Human

  • Test different connector types: Verify the command works with both source and destination connectors
  • Test installation methods: Try pip URLs, Docker images, local executables, and manifest files
  • Test --use-python parameter: Verify behavior with 'true', 'false', paths, and version strings like '3.11'
  • Test error handling: Try invalid connector names, non-existent versions, and malformed parameters
  • Verify image build use case: Test that installed connectors can be used without re-installation in subsequent commands

Recommended Test Plan:

  1. Test basic installation: pyab install --connector=source-hardcoded-records
  2. Test with use-python: pyab install --connector=source-hardcoded-records --use-python=true
  3. Test invalid version to verify error handling: pyab install --connector=source-hardcoded-records --version=999.999.999
  4. Test that installed connector works in subsequent operations without re-installation

Diagram

%%{ init : { "theme" : "default" }}%%
graph TD
    CLI["airbyte/cli.py"]:::major-edit
    Executor["airbyte/_executors/util.py<br/>get_connector_executor()"]:::context
    VenvExec["airbyte/_executors/python.py<br/>VenvExecutor"]:::context
    DockerExec["airbyte/_executors/docker.py<br/>DockerExecutor"]:::context
    
    CLI -->|"calls directly"| Executor
    Executor -->|"creates appropriate"| VenvExec
    Executor -->|"creates appropriate"| DockerExec
    
    VenvExec -->|"executor.install()"| InstallVenv["Virtual Environment<br/>Installation"]:::context
    DockerExec -->|"executor.install()"| InstallDocker["Docker Image<br/>Installation"]:::context

    subgraph Legend
        L1[Major Edit]:::major-edit
        L2[Minor Edit]:::minor-edit  
        L3[Context/No Edit]:::context
    end

    classDef major-edit fill:#90EE90
    classDef minor-edit fill:#87CEEB
    classDef context fill:#FFFFFF
Loading

Notes

  • The install command uses get_connector_executor() directly rather than get_source()/get_destination() for efficiency and type-agnostic support
  • All CLI option parsing follows existing patterns from other commands (validate, benchmark, sync)
  • Error handling uses PyAirbyteInputError consistent with other CLI commands
  • Tested manually with source-hardcoded-records connector and various parameter combinations
  • Unit tests (182 passed) and integration tests (8 passed) confirm no regressions

Session Info:

Summary by CodeRabbit

  • New Features
    • Added a new CLI command to install connectors with options for specifying name, version, pip URL, Docker image, local executable, and other advanced installation parameters. Users receive clear success or error messages during the installation process.

- Add new 'install' command that accepts generic connector arguments
- Support all installation methods: pip, docker, local executable, manifest
- Include --use-python option for Python interpreter selection
- Useful for pre-installing connectors during image build processes
- Uses get_connector_executor() directly for efficiency and type-agnostic approach
- Follows existing CLI patterns with proper error handling

Co-Authored-By: AJ Steers <[email protected]>
@Copilot Copilot AI review requested due to automatic review settings August 1, 2025 16:32
Copy link
Contributor

Original prompt from AJ Steers
@Devin - Add an 'install' CLI command that can be used to pre-install connectors. The most compelling use case is to run during an image build process, so as to front-load installation costs. We should accept generic connector args, for both sources and destinations

Copy link
Contributor

🤖 Devin AI Engineer

I'll be helping with this pull request! Here's what you should know:

✅ I will automatically:

  • Address comments on this PR. Add '(aside)' to your comment to have me ignore it.
  • Look at CI failures and help fix them

Note: I can only respond to comments from users who have write access to this repository.

⚙️ Control Options:

  • Disable automatic comment and CI monitoring

Copy link

github-actions bot commented Aug 1, 2025

👋 Greetings, Airbyte Team Member!

Here are some helpful tips and reminders for your convenience.

Testing This PyAirbyte Version

You can test this version of PyAirbyte using the following:

# Run PyAirbyte CLI from this branch:
uvx --from 'git+https://github.com/airbytehq/PyAirbyte.git@devin/1754025312-add-install-cli-command' pyairbyte --help

# Install PyAirbyte from this branch for development:
pip install 'git+https://github.com/airbytehq/PyAirbyte.git@devin/1754025312-add-install-cli-command'

Helpful Resources

PR Slash Commands

Airbyte Maintainers can execute the following slash commands on your PR:

  • /fix-pr - Fixes most formatting and linting issues
  • /poetry-lock - Updates poetry.lock file
  • /test-pr - Runs tests with the updated PyAirbyte

Community Support

Questions? Join the #pyairbyte channel in our Slack workspace.

📝 Edit this welcome message.

Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR adds a new install CLI command to PyAirbyte that allows pre-installing connectors without running them, which is particularly useful for optimizing container image builds by front-loading installation costs.

  • Adds pyab install command with comprehensive connector installation options
  • Supports all installation methods including pip URLs, Docker images, local executables, and YAML manifests
  • Uses get_connector_executor() directly for type-agnostic connector handling

Comment on lines +698 to +720
docker_image_param: bool | str | None = None
if docker_image == "true":
docker_image_param = True
elif docker_image:
docker_image_param = docker_image

source_manifest_param: bool | str | None = None
if source_manifest == "true":
source_manifest_param = True
elif source_manifest:
source_manifest_param = source_manifest

use_python_param = _parse_use_python(use_python)

try:
executor = get_connector_executor(
name=connector,
version=version,
pip_url=pip_url,
local_executable=local_executable,
docker_image=docker_image_param,
use_host_network=use_host_network,
source_manifest=source_manifest_param,
Copy link

Copilot AI Aug 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nitpick] The variable name docker_image_param is verbose and the pattern of converting string parameters to typed parameters could be extracted into a helper function to reduce code duplication and improve maintainability.

Suggested change
docker_image_param: bool | str | None = None
if docker_image == "true":
docker_image_param = True
elif docker_image:
docker_image_param = docker_image
source_manifest_param: bool | str | None = None
if source_manifest == "true":
source_manifest_param = True
elif source_manifest:
source_manifest_param = source_manifest
use_python_param = _parse_use_python(use_python)
try:
executor = get_connector_executor(
name=connector,
version=version,
pip_url=pip_url,
local_executable=local_executable,
docker_image=docker_image_param,
use_host_network=use_host_network,
source_manifest=source_manifest_param,
# Use helper function below to parse docker_image and source_manifest
use_python_param = _parse_use_python(use_python)
def parse_bool_or_str(val: str | None) -> bool | str | None:
if val == "true":
return True
elif val:
return val
return None
docker_image = parse_bool_or_str(docker_image)
source_manifest = parse_bool_or_str(source_manifest)
try:
executor = get_connector_executor(
name=connector,
version=version,
pip_url=pip_url,
local_executable=local_executable,
docker_image=docker_image,
use_host_network=use_host_network,
source_manifest=source_manifest,

Copilot uses AI. Check for mistakes.

Comment on lines +698 to +708
docker_image_param: bool | str | None = None
if docker_image == "true":
docker_image_param = True
elif docker_image:
docker_image_param = docker_image

source_manifest_param: bool | str | None = None
if source_manifest == "true":
source_manifest_param = True
elif source_manifest:
source_manifest_param = source_manifest
Copy link

Copilot AI Aug 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nitpick] Similar to docker_image_param, this parameter conversion logic is duplicated. Consider extracting the pattern of converting 'true' string to boolean into a reusable helper function.

Suggested change
docker_image_param: bool | str | None = None
if docker_image == "true":
docker_image_param = True
elif docker_image:
docker_image_param = docker_image
source_manifest_param: bool | str | None = None
if source_manifest == "true":
source_manifest_param = True
elif source_manifest:
source_manifest_param = source_manifest
docker_image_param: bool | str | None = _parse_bool_or_str(docker_image)
source_manifest_param: bool | str | None = _parse_bool_or_str(source_manifest)

Copilot uses AI. Check for mistakes.

executor.install()
print(f"Connector '{connector}' installed successfully!", file=sys.stderr)

except Exception as e:
Copy link

Copilot AI Aug 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Catching the broad Exception class may hide specific errors that could be handled differently. Consider catching more specific exceptions or at least preserving the original exception type in logs for better debugging.

Suggested change
except Exception as e:
except Exception as e:
print("An unexpected error occurred during connector installation:", file=sys.stderr)
traceback.print_exc(file=sys.stderr)

Copilot uses AI. Check for mistakes.

Copy link
Contributor

coderabbitai bot commented Aug 1, 2025

Warning

Rate limit exceeded

@devin-ai-integration[bot] has exceeded the limit for the number of commits or files that can be reviewed per hour. Please wait 22 minutes and 27 seconds before requesting another review.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

📥 Commits

Reviewing files that changed from the base of the PR and between d3e9f59 and fc649a8.

📒 Files selected for processing (1)
  • airbyte/_executors/docker.py (2 hunks)
📝 Walkthrough

Walkthrough

A new install command was introduced to the airbyte.cli module, enabling users to pre-install connectors with various input parameters. The implementation includes argument parsing, normalization, executor acquisition, installation invocation, error handling, and CLI registration.

Changes

Cohort / File(s) Change Summary
New CLI Install Command
airbyte/cli.py
Added install CLI command for pre-installing connectors; includes argument parsing, normalization of inputs, executor acquisition with install_if_missing=True, installation call, validation step, error handling, and CLI group registration.

Sequence Diagram(s)

sequenceDiagram
    participant User
    participant CLI
    participant Executor

    User->>CLI: airbyte install [options]
    CLI->>CLI: Parse and normalize arguments
    CLI->>Executor: get_connector_executor(install_if_missing=True)
    Executor-->>CLI: Return executor instance
    CLI->>Executor: install()
    Executor-->>CLI: Installation result
    CLI->>Executor: If validate, run spec command
    Executor-->>CLI: Spec output or error
    CLI->>User: Print success, warning, or error message
Loading

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~7 minutes

Possibly related PRs

Suggested reviewers

  • aaronsteers

Would you like me to suggest some tests or documentation improvements for this new CLI command, wdyt?

✨ Finishing Touches
  • 📝 Generate Docstrings
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch devin/1754025312-add-install-cli-command

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

‼️ IMPORTANT
Auto-reply has been disabled for this repository in the CodeRabbit settings. The CodeRabbit bot will not respond to your replies unless it is explicitly tagged.

  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai explain this code block.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and explain its main purpose.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR.
  • @coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
  • @coderabbitai generate unit tests to generate unit tests for this PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai or @coderabbitai title anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
airbyte/cli.py (1)

698-708: Consider improving boolean string normalization consistency - wdyt?

The current normalization only handles "true" but not "false" for both docker_image and source_manifest. Should we also handle "false" explicitly to convert it to False? This would make the behavior more predictable and consistent.

Also, since this pattern is duplicated, would you consider extracting it to a helper function like _parse_boolean_or_string(value: str | None) -> bool | str | None?

+def _parse_boolean_or_string(value: str | None) -> bool | str | None:
+    """Parse a string that could be 'true', 'false', or an actual value."""
+    if value == "true":
+        return True
+    elif value == "false":
+        return False
+    elif value:
+        return value
+    return None
+
 def install(
     connector: str,
     version: str | None = None,
     pip_url: str | None = None,
     docker_image: str | None = None,
     local_executable: str | None = None,
     *,
     use_host_network: bool = False,
     source_manifest: str | None = None,
     use_python: str | None = None,
 ) -> None:
     """CLI command to install a connector."""
-    docker_image_param: bool | str | None = None
-    if docker_image == "true":
-        docker_image_param = True
-    elif docker_image:
-        docker_image_param = docker_image
-
-    source_manifest_param: bool | str | None = None
-    if source_manifest == "true":
-        source_manifest_param = True
-    elif source_manifest:
-        source_manifest_param = source_manifest
+    docker_image_param = _parse_boolean_or_string(docker_image)
+    source_manifest_param = _parse_boolean_or_string(source_manifest)
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 6189e9c and 8d93ccc.

📒 Files selected for processing (1)
  • airbyte/cli.py (3 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (6)
  • GitHub Check: Pytest (All, Python 3.11, Windows)
  • GitHub Check: Pytest (All, Python 3.10, Windows)
  • GitHub Check: Pytest (All, Python 3.11, Ubuntu)
  • GitHub Check: Pytest (All, Python 3.10, Ubuntu)
  • GitHub Check: Pytest (No Creds)
  • GitHub Check: Pytest (Fast)
🔇 Additional comments (4)
airbyte/cli.py (4)

72-72: LGTM on the import addition!

The import of get_connector_executor is correctly placed and necessary for the new install command functionality.


636-685: Comprehensive command definition with good option coverage!

The install command options cover all the necessary parameters for different connector installation methods. The help text is clear and follows the established patterns in the codebase.


745-745: Perfect command registration!

The install command is properly registered with the CLI group, following the established pattern.


713-734: No action needed—get_connector_executor handles all passed parameters correctly

I’ve reviewed the implementation in airbyte/_executors/util.py and confirmed that:

  • The signature matches your call site exactly.
  • It enforces mutual exclusivity between local_executable, docker_image, pip_url/use_python, and source_manifest.
  • It applies sane defaults (local executable → manifest → Python → Docker) when no method is specified.
  • It validates combinations like version + pip_url or tagged docker_image + version, raising PyAirbyteInputError as needed.
  • It respects use_host_network in the Docker branch and honors install_if_missing=True for the virtual-env path.

All looks good to me—let me know if you’d like a deeper dive! wdyt?

Copy link

github-actions bot commented Aug 1, 2025

PyTest Results (Fast Tests Only, No Creds)

301 tests  ±0   301 ✅ ±0   4m 23s ⏱️ +9s
  1 suites ±0     0 💤 ±0 
  1 files   ±0     0 ❌ ±0 

Results for commit fc649a8. ± Comparison against base commit 6189e9c.

♻️ This comment has been updated with latest results.

- Add validation option that defaults to True (--validate)
- Run spec command after installation to verify connector functionality
- Report installation success separately from validation success
- Show validation failures as warnings to allow manual debugging
- Use executor.execute(['spec']) for validation instead of creating connector object
- Fix linting issues with noqa comment for argument count

Co-Authored-By: AJ Steers <[email protected]>
Copy link

github-actions bot commented Aug 1, 2025

PyTest Results (Full)

364 tests  ±0   350 ✅ ±0   20m 42s ⏱️ -31s
  1 suites ±0    14 💤 ±0 
  1 files   ±0     0 ❌ ±0 

Results for commit fc649a8. ± Comparison against base commit 6189e9c.

♻️ This comment has been updated with latest results.

- Replace NotImplementedError with actual Docker image management
- Use 'docker pull' for install() to download images
- Use 'docker rmi' for uninstall() to remove local images
- Add proper error handling and logging
- Handle cases where Docker is not installed or images don't exist
- Gracefully handle missing images during uninstall

Co-Authored-By: AJ Steers <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant